Chapter 9 - ARIMA Models
Forecasting: Principles and Practice

R. J. Serrano

05/12/2023

ARIMA models —

Learning objectives:

Introduction —

ARIMA models provide another approach to time series forecasting. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

9.1 - Stationarity and differencing —

A stationary time series is one whose statistical properties do not depend on the time at which the series is observed.

Thus, a stationary time series exhibits the following characteristics:

Source: What is stationarity? https://youtu.be/aIdTGKjQWjA

Figure 9.1: Which of this series are stationary?

Differencing

In Figure 9.1, note that the Google stock price was non-stationary in panel (a), but the daily changes were stationary in panel (b). This shows one way to make a non-stationary time series stationary — compute the differences between consecutive observations. This is known as differencing.

google_2018 <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2018)

# original time series
google_2018 |> 
     autoplot(Adj_Close) + 
     labs(title = 'Google Closing Stock Price ($USD)')

google_2018 |> ACF(Close) |>
  autoplot() + labs(subtitle = "Google closing stock price")

google_2018 |> ACF(difference(Close)) |>
  autoplot() + labs(subtitle = "Changes in Google closing stock price")

google_2018 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, ljung_box, lag = 8)

The ACF of the differenced Google stock price looks just like that of a white noise series. Since the p-value is greater than 0.05, this suggests that the daily change in the Google stock price is essentially a random amount which is uncorrelated with that of previous days.

Random walk model —-

A time series is said to follow a random walk process if the predicted value of the series in one period is equivalent to the value of the series in the previous period plus a random error. Source: https://analystprep.com/study-notes/cfa-level-2/random-walk-process/

If differenced series is white noise with zero mean:

where \(\varepsilon_t \sim NID(0,\sigma^2)\).

Random walk models are widely used for non-stationary data, particularly financial and economic data. Random walks typically have:

Seasonal differencing —-

A seasonal difference is the difference between an observation and the corresponding observation from the previous year.\[ y'_t = y_t - y_{t-m} \] Antidiabetic drug sales

a10 <- PBS |>
  filter(ATC2 == "A10") |>
  summarise(Cost = sum(Cost) / 1e6)

a10 |> autoplot(
  Cost
) + 
     labs(title = 'Austrialia monthly scripts for A10 (antidiabetic) drugs sold')

a10 |> autoplot(
  log(Cost)
) + 
     labs(title = 'Austrialia monthly scripts for A10 (antidiabetic) drugs sold')

a10 |> autoplot(
  log(Cost) |> difference(12)
) + 
     labs(title = 'Annual change in monthly scripts for A10 (antidiabetic) drugs sold')

Unit root test —-

One way to determine more objectively whether differencing is required is to use a unit root test. These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required.

Source: textbook Chapter 9.1

For example, let us apply it to the Google stock price data.

google_2018 |>
  features(Close, unitroot_kpss)

The p-value is less than 0.05, so the null hypothesis (time series is stationary) is rejected in favor of the alternate hypothesis (time series is NOT stationary).

We can difference the time series and apply the test again.

google_2018 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, unitroot_kpss)

This time, the p-value is greater than 0.05, so we can conclude that the time series appears stationary.

Exercise:

For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.

9.2 - Backshift notation

A very useful notational device is the backward shift operator, \(B\), which is used as follows: \[ B y_{t} = y_{t - 1} \]

In other words, \(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.

Two applications of \(B\) to \(y_{t}\) shifts the data back two periods: \[ B(By_{t}) = B^{2}y_{t} = y_{t-2} \]

9.3 - Autoregressive models

In a multiple regression model, introduced in Chapter 7, we forecast the variable of interest using a linear combination of predictors. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.

AR(1) model

9.4 - Moving average models

Invertibility

9.5 - Non-seasonal ARIMA models

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average (in this context, “integration” is the reverse of differencing).

The full model can be written as :

Example: Egyptian exports

global_economy |>
     filter(Code == "EGY") |>
     autoplot(Exports) +
     labs(y = "% of GDP", title = "Egyptian exports")

The following R code selects a non-seasonal ARIMA model automatically.

fit <- global_economy |>
     filter(Code == "EGY") |>
     model(ARIMA(Exports))

report(fit)
## Series: Exports 
## Model: ARIMA(2,0,1) w/ mean 
## 
## Coefficients:
##          ar1      ar2      ma1  constant
##       1.6764  -0.8034  -0.6896    2.5623
## s.e.  0.1111   0.0928   0.1492    0.1161
## 
## sigma^2 estimated as 8.046:  log likelihood=-141.57
## AIC=293.13   AICc=294.29   BIC=303.43

Fitted model residual plots

fit |> 
     gg_tsresiduals()

Forecast for 10 years (periods)

fit |> forecast(h=10) |>
     autoplot(global_economy) +
     labs(y = "% of GDP", title = "Egyptian exports")

ACF and PACF plots

It is usually not possible to tell, simply from a time plot, what values of \(p\) and \(q\) are appropriate for the data. However, it is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for \(p\) and \(q\).

global_economy |>
     filter(Code == "EGY") |>
     gg_tsdisplay(Exports, plot_type = "partial")

We can detect a decaying sinusoidal pattern in the ACF, and the PACF shows the last significant spike at lag 4. This is what you would expect from an ARIMA(4,0,0) model.

fit2 <- global_economy |>
     filter(Code == "EGY") |>
     model(ARIMA(Exports ~ pdq(4,0,0)))

report(fit2)
## Series: Exports 
## Model: ARIMA(4,0,0) w/ mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4  constant
##       0.9861  -0.1715  0.1807  -0.3283    6.6922
## s.e.  0.1247   0.1865  0.1865   0.1273    0.3562
## 
## sigma^2 estimated as 7.885:  log likelihood=-140.53
## AIC=293.05   AICc=294.7   BIC=305.41

This model is only slightly worse than the ARIMA(2,0,1) model identified by ARIMA() (with an AICc value of 294.70 compared to 294.29).

9.7 - ARIMA modelling in fable

The ARIMA() function in the fable package uses a variation of the Hyndman-Khandakar algorithm (Hyndman & Khandakar, 2008), which combines unit root tests, minimisation of the AICc and MLE to obtain an ARIMA model. The arguments to ARIMA() provide for many variations on the algorithm. What is described here is the default behaviour.

Modelling procedure

When fitting an ARIMA model to a set of (non-seasonal) time series data, the following procedure provides a useful general approach.

Source: Figure 9.12

Portmanteau tests of residuals for ARIMA models

With ARIMA models, more accurate portmanteau tests are obtained if the degrees of freedom of the test statistic are adjusted to take account of the number of parameters in the model. Specifically, we use ℓ−\(K\) degrees of freedom in the test, where \(K\) is the number of AR and MA parameters in the model. So for the non-seasonal models, we have considered so far, \(K\)=\(p\)+\(q\). The value of \(K\) is passed to the ljung_box function via the argument dof, as shown in the example below.

Example: Central Africa Republic exports

global_economy |>
     filter(Code == "CAF") |>
     autoplot(Exports) +
     labs(title="Central African Republic exports",
          y="% of GDP")

  1. The time plot shows some non-stationarity, with an overall decline. The improvement in 1994 was due to a new government which overthrew the military junta and had some initial success, before unrest caused further economic decline.

  2. There is no evidence of changing variance, so we will not do a Box-Cox transformation.

  3. To address the non-stationarity, we will take a first difference of the data.

global_economy |>
     filter(Code == "CAF") |>
     gg_tsdisplay(difference(Exports), plot_type='partial')

  1. The PACF shown in the above figure is suggestive of an AR(2) model; so an initial candidate model is an ARIMA(2,1,0). The ACF suggests an MA(3) model; so an alternative candidate is an ARIMA(0,1,3).

  2. We fit both an ARIMA(2,1,0) and an ARIMA(0,1,3) model along with two automated model selections, one using the default stepwise procedure, and one working harder to search a larger model space.

caf_fit <- global_economy |>
     filter(Code == "CAF") |>
     model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
           arima013 = ARIMA(Exports ~ pdq(0,1,3)),
           stepwise = ARIMA(Exports),
           search = ARIMA(Exports, stepwise=FALSE))

caf_fit |> pivot_longer(!Country, names_to = "Model name",
                         values_to = "Orders")
caf_fit |> 
     glance() |> 
     arrange(AICc) |> 
     select(.model:BIC)

The four models have almost identical AICc values. Of the models fitted, the full search has found that an ARIMA(3,1,0) gives the lowest AICc value, closely followed by the ARIMA(2,1,0) and ARIMA(0,1,3) — the latter two being the models that we guessed from the ACF and PACF plots. The automated stepwise selection has identified an ARIMA(2,1,2) model, which has the highest AICc value of the four models.

  1. The ACF plot of the residuals from the ARIMA(3,1,0) model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise.
caf_fit |>
     select(search) |>
     gg_tsresiduals()

A portmanteau test (setting K=3) returns a large p-value, also suggesting that the residuals are white noise.

augment(caf_fit) |>
     filter(.model=='search') |>
     features(.innov, ljung_box, lag = 10, dof = 3)

Forecast for 5 years (periods)

caf_fit |>
     forecast(h=5) |>
     filter(.model=='search') |>
     autoplot(global_economy)

Note that the mean forecasts look very similar to what we would get with a random walk (equivalent to an ARIMA(0,1,0)). The extra work to include AR and MA terms has made little difference to the point forecasts in this example, although the prediction intervals are much narrower than for a random walk model.